Large-scale Feature Selection of Risk Genetic Factors for Alzheimer's Disease via Distributed Group Lasso Regression

نویسندگان

  • Qingyang Li
  • Dajiang Zhu
  • Jie Zhang
  • Derrek Paul Hibar
  • Neda Jahanshad
  • Yalin Wang
  • Jieping Ye
  • Paul M. Thompson
  • Jie Wang
چکیده

Genome-wide association studies (GWAS) have achieved great success in the genetic study of Alzheimer’s disease (AD). Collaborative imaging genetics studies across different research institutions show the effectiveness of detecting genetic risk factors. However, the high dimensionality of GWAS data poses significant challenges in detecting risk SNPs for AD. Selecting relevant features is crucial in predicting the response variable. In this study, we propose a novel Distributed Feature Selection Framework (DFSF) to conduct the large-scale imaging genetics studies across multiple institutions. To speed up the learning process, we propose a family of distributed group Lasso screening rules to identify irrelevant features and remove them from the optimization. Then we select the relevant group features by performing the group Lasso feature selection process in a sequence of parameters. Finally, we employ the stability selection to rank the top risk SNPs that might help detect the early stage of AD. To the best of our knowledge, this is the first distributed feature selection model integrated with group Lasso feature selection as well as detecting the risk genetic factors across multiple research institutions system. Empirical studies are conducted on 809 subjects with 5.9 million SNPs which are distributed across several individual institutions, demonstrating the efficiency and effectiveness of the proposed method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Genetic Risk Factors via Sparse Group Lasso with Group Graph Structure

Genome-wide association studies (GWA studies or GWAS) investigate the relationships between genetic variants such as single-nucleotide polymorphisms (SNPs) and individual traits. Recently, incorporating biological priors together with machine learning methods in GWA studies has attracted increasing attention. However, in real-world, nucleotide-level bio-priors have not been well-studied to date...

متن کامل

Identification of Genetic Polymorphism Interactions in Sporadic Alzheimer’s Disease Using Logic Regression

Objectives: Genetic polymorphism interactions are among the important factors in affliction with complex diseases like Alzheimer’s disease. The important goal of genetic association studies is to identify a combination of polymorphisms and measure their importance in increasing the risk of occurrence of such diseases. In this study, feature selection approach of logic regression was used to ide...

متن کامل

Selection of models for the analysis of risk-factor trees: leveraging biological knowledge to mine large sets of risk factors with application to microbiome data

MOTIVATION Establishment of a statistical association between microbiome features and clinical outcomes is of growing interest because of the potential for yielding insights into biological mechanisms and pathogenesis. Extracting microbiome features that are relevant for a disease is challenging and existing variable selection methods are limited due to large number of risk factor variables fro...

متن کامل

The role of genetics in alzheimer’s disease

Alzheimer's disease is a progressive neurological disorder that causes the brain to shrink (atrophy) and brain cells die. Alzheimer's disease is the most common cause of dementia and causes a decrease in thinking skills and social behaviors. Alzheimer's disease is more common in people over 65 years old. The risk of developing Alzheimer's disease and other types of dementia increases with age,...

متن کامل

Genome-wide Multiple Loci Mapping in Experimental Crosses by the Iterative Adaptive Penalized Regression

Genome-wide multiple loci mapping can be viewed as a variable selection problem where the major objective is to select genetic markers related with a trait of interest. This is a challenging variable selection problem because the number of genetic markers is large (often much larger than the sample size) and there are often strong linkage or linkage disequilibrium between markers. In this paper...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1704.08383  شماره 

صفحات  -

تاریخ انتشار 2017